XML index compression by DTD subtraction

نویسندگان

  • Stefan Böttcher
  • Rita Hartel
  • Niklas Klein
چکیده

Whenever XML is used as format to exchange large amounts of data or even for data streams, the verbose behaviour of XML is one of the bottlenecks. While compression of XML data seems to be a way out, it is essential for a variety of applications that the compression result can be queried efficiently. Furthermore, for efficient path query evaluation, an index is desired, which usually generates an additional data structure. For this purpose, we have developed a compression technique that uses structure information found in the DTD to perform a structure-preserving compression of XML data and provides a compression of an index that allows for efficient search in the compressed data. Our evaluation shows that compression factors which are close to gzip are possible, whereas the structural part of XML files can be compressed even better.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BSBC: Towards a Succinct Data Format for XML Streams

XML data compression is an important feature in XML data exchange, particularly when the data size may cause bottlenecks or when bandwidth and energy consumption limitations require reducing the amount of the exchanged XML data. However, applications based on XML data streams also require efficient path query processing on the structure of compressed XML data streams. We present a succinct repr...

متن کامل

An Empirical Evaluation of Simple DTD-Conscious Compression Techniques

To avoid ambiguity, in this paper, the term “XML compression” is used in the first (and, we believe, original and most accurate) sense exclusively. We will compare our proposed techniques only with other approaches that address problem (1), not problems (2) or (3). Since XML markup often displays a high degree of redundancy, ordinary text compressors (gzip [7], bzip2 [15], etc.) are frequently ...

متن کامل

XCQ: XML Compression and Querying System

We present our development of an XML compression and querying tool, which is called XML Compression and Querying System (XCQ). This system is developed based on a novel technique called DTD Tree and SAX Event Stream Parsing (DSP). This technique is designed for efficient compression of XML documents that conform to a given DTD without involving user expertise. A reasonable compression ratio, wh...

متن کامل

Knowledge and Information Systems REGULAR PAPER

XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a factor that hinders its practical usage, since it substantially increases the costs of storing, processing, and exchanging data. In order to tackle this problem, many XML-specific compression systems, such as XMill, XGr...

متن کامل

Algorithm for XML Compression using DTD and Stack

Worldwide standard for data definition is XML. For developing SOA based applications XML is extensively used. SOA based applications contains many different applications which are integrated to each other. For solving the problem of interoperability XML documents are used. XML is widely used for a variety of tasks, including configuration files, protocols, and web services. XML has problem with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007